4 research outputs found

    New debiasing strategies in collaborative filtering recommender systems: modeling user conformity, multiple biases, and causality.

    Get PDF
    Recommender Systems are widely used to personalize the user experience in a diverse set of online applications ranging from e-commerce and education to social media and online entertainment. These State of the Art AI systems can suffer from several biases that may occur at different stages of the recommendation life-cycle. For instance, using biased data to train recommendation models may lead to several issues, such as the discrepancy between online and offline evaluation, decreasing the recommendation performance, and hurting the user experience. Bias can occur during the data collection stage where the data inherits the user-item interaction biases, such as selection and exposure bias. Bias can also occur in the training stage, where popular items tend to be recommended much more frequently given that they received more interactions to start with. The closed feedback loop nature of online recommender systems will further amplify the latter biases as well. In this dissertation, we study the bias in the context of Collaborative Filtering recommender system, and propose a new Popularity Correction Matrix Factorization (PCMF) that aims to improve the recommender system performance as well as decrease popularity bias and increase the diversity of items in the recommendation lists. PCMF mitigates popularity bias by disentangling relevance and conformity and by learning a user-personalized bias vector to capture the users\u27 individual conformity levels along a full spectrum of conformity bias. One shortcoming of the proposed PCMF debiasing approach, is its assumption that the recommender system is affected by only popularity bias. However in the real word, different types of bias do occur simultaneously and interact with one another. We therefore relax the latter assumption and propose a multi-pronged approach that can account for two biases simultaneously, namely popularity and exposure bias. our experimental results show that accounting for multiple biases does improve the results in terms of providing more accurate and less biased results. Finally, we propose a novel two-stage debiasing approach, inspired from the proximal causal inference framework. Unlike the existing causal IPS approach that corrects for observed confounders, our proposed approach corrects for both observed and potential unobserved confounders. The approach relies on a pair of negative control variables to adjust for the bias in the potential ratings. Our proposed approach outperforms state of the art causal approaches, proving that accounting for unobserved confounders can improve the recommendation system\u27s performance

    Clustering heterogeneous autism spectrum disorder data.

    Get PDF
    Autism spectrum disorder (ASD) is a developmental disorder that affects communication and behavior. Several studies have been conducted in the past years to develop a better understanding of the disease and therefore a better diagnosis and a better treatment by analyzing diverse data sets consisting of behavioral surveys and tests, phenotype description, and brain imagery. However, data analysis is challenged by the diversity, complexity and heterogeneity of patient cases and by the need for integrating diverse data sets to reach a better understanding of ASD. The aim of our study is to mine homogeneous groups of patients from a heterogeneous set of data consisting of both ADOS and Behavioral datasets and to interpret the discovered clusters within the medical context of the affected brain areas using fMRI data. We developed an unsupervised machine learning pipeline to mine a heterogenous data set consisting of the Standardized Autism Diagnostic Observation Schedule (ADOS) scores, which are metrics used to measure the autism severity, phenotypical and behavioral data, which is used to identify behavioral problems for autistic patients, and functional Magnetic Resonance Imaging (fMRI) which is a technique of measuring and mapping brain activity. Our Big Data pipeline utilizes different clustering algorithms to partition the patients into homogeneous groups: hierarchical clustering, spectral clustering and spectral co-clustering. In addition, we design a general framework that adds explainability to clustering algorithms in a way that assists the end-user in making sense of the clustering outputs through answering their questions about the results relative to the input data itself as well as available external evidence. Our clustering algorithms were able to discover homogeneous groups of patients that share similar behavioral and phenotypical characteristics. Furthermore, we generate an accessible interpretation of clustering results by mapping the discovered clusters onto the brain structure. Through our clustering and explanation modules, our unsupervised machine learning methodology enables the domain experts to perform a powerful analysis on homogeneous cases, such as discovering hidden associations between the genetic data of patients belonging to the same cluster in order to have a better understanding of Autism Spectrum Disorder (ASD) and to pave the way toward data-driven personalized medicine
    corecore